1
The SIMT Execution Model and Warp Partitioning
AI032 Lesson 6
00:00

The SIMT (Single-Instruction, Multiple-Thread) model is the heartbeat of GPU architecture. While you program individual threads, the hardware orchestrates them into a two-level hierarchy of grids and blocks. To maximize efficiency, the hardware further partitions these blocks into 32-thread units called warps.

1. SIMT vs. SIMD

Unlike CPU SIMD (like SSE/AVX) where you manually pack data into registers, SIMT allows threads to appear independent. The hardware automatically groups threads into warps, fetching one instruction for all 32 threads to execute in lockstep.

2. Linearization Rule

Programmers use threadIdx.x, y, z for logic, but the hardware flattens this into a 1D sequence for scheduling:

Index = x + (y × blockDim.x) + (z × blockDim.x × blockDim.y)
2D Block(8 × 8)Warp 0: IDs 0-31Warp 1: IDs 32-63

Because the x-dimension is the fastest-varying index, threads with consecutive threadIdx.x values usually land in the same warp, which is critical for memory coalescing.

main.py
TERMINAL bash — 80x24
> Ready. Click "Run" to execute.
>